Learning with Many Irrelevant Features

نویسندگان

Hussein Almuallim

Thomas G. Dietterich

چکیده

In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hypotheses deenable over as few features as possible. This paper deenes and studies this bias. First, it is shown that any learning algorithm implementing the MIN-FEATURES bias requires (1 ln 1 + 1 2 p + p lnn]) training examples to guarantee PAC-learning a concept having p relevant features out of n available features. This bound is only logarithmic in the number of irrelevant features. The paper also presents a quasi-polynomial time algorithm, FOCUS, which implements MIN-FEATURES. Experimental studies are presented that compare FOCUS to the ID3 and FRINGE algorithms. These experiments show that| contrary to expectations|these algorithms do not implement good approximations of MIN-FEATURES. The coverage, sample complexity, and generalization performance of FOCUS is substantially better than either ID3 or FRINGE on learning problems where the MIN-FEATURES bias is appropriate. This suggests that, in practical applications, training data should be preprocessed to remove irrelevant features before being given to ID3 or FRINGE.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Classification Learning Algorithm Robust to Irrelevant Features

Abs t r ac t . Presence of irrelevant features is a fact of life in many realworld applications of classification learning. Although nearest-neighbor classification algorithms have emerged as a promising approach to machine learning tasks with their high predictive accuracy, they are adversely affected by the presence of such irrelevant features. In this paper, we describe a recently proposed c...

متن کامل

Learning Boolean Concepts in the Presence of Many Irrelevant Features

In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hypotheses de nable over as few features as possible. This paper de nes and studies this bias in Boolean domains. First, it is shown that any learning algorithm implementing the MIN-FEATURES bias requires (1 ln 1 + 1 [2p + p lnn]) training examples to guarantee PAC-learning a concept having p relev...

متن کامل

Feature Subset Selection using Rough Sets for High Dimensional Data

---------------------------------------------------------------------***--------------------------------------------------------------------Abstract Feature Selection (FS) is applied to reduce the number of features in many applications where data has multiple features. FS is an essential step in successful data mining applications, which can effectively reduce data dimensionality by removing t...

متن کامل

Features Selection and Rule Removal for Frequent Association Rule Based Classification

The performance of association rule based classification is notably deteriorated with the existence of irrelevant and redundant features and complex attributes. Association rules naturally often suffer from a large volume of rules generated, many of which are not interesting and useful. Thus, selecting relevant feature and/or removing unrelated rules can significantly improve the association ru...

متن کامل

Feature selection, L1 vs. L2 regularization, and rotational invariance

We consider supervised learning in the presence of very many irrelevant features, and study two different regularization methods for preventing overfitting. Focusing on logistic regression, we show that using L1 regularization of the parameters, the sample complexity (i.e., the number of training examples required to learn “well,”) grows only logarithmically in the number of irrelevant features...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1991

Learning with Many Irrelevant Features

نویسندگان

چکیده

منابع مشابه

A Classification Learning Algorithm Robust to Irrelevant Features

Learning Boolean Concepts in the Presence of Many Irrelevant Features

Feature Subset Selection using Rough Sets for High Dimensional Data

Features Selection and Rule Removal for Frequent Association Rule Based Classification

Feature selection, L1 vs. L2 regularization, and rotational invariance

عنوان ژورنال:

اشتراک گذاری